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TRANSFORMING  TRAINING:  A  PERSPECTIVE 
ON  THE  NEED  AND  PAYOFFS  FROM  COMMON  STANDARDS 


INTRODUCTION 

Contributions  to  the  training  research  community  occur  at  all  levels,  from  basic  research 
to  integrating  new  advanced  technology  developments.  From  an  application  perspective,  the  use 
of  common  standards  has  the  enonnous  potential  to  facilitate  comparisons  across  laboratory  and 
field  studies.  In  contrast,  application-oriented  training  research  has  typically  been  conducted 
using  the  idiosyncratic  methods  unique  to  particular  institutions.  This  limits  the  scientific 
community’s  ability  to  provide  guidance  back  to  the  warfighter  when  attempting  to  compare 
which  inter-organizational  methods  and  results  do  indeed  yield  the  best  value-added  training. 
Scientifically  powerful  and  unprecedented  on  a  large  scale,  a  common  set  of  warfighter-valid 
methods  and  standards  would  allow  for  cross-comparison  and  the  leverage  of  laboratory  and 
field  study  results.  This  would  permit  quantifiable  feedback  to  the  warfighters  as  to  which 
training  techniques  and  technologies  should  be  pursued.  Standards  emerging  today  position  the 
training  research  community  on  the  eve  of  this  scientific  breakthrough.  In  the  near  future,  the 
scientific  community  is  likely  to  benefit  from  this  ability  to  routinely  cross-compare  training 
technologies  and  techniques  from  laboratory  training  study  results,  various  operational  training 
implementations,  and  possibly  even  live  exercises.  Retention  and  transfer-of-training  studies 
could  become  routine. 

To  realize  this  scientific  cross-comparison  capability,  common  standards  must  exist  in 
three  primary  areas,  namely, 

(1)  defined  skill  competencies  to  be  assessed, 

(2)  metrics  to  evaluate  those  skill  competencies,  and 

(3)  technology  enablers  to  employ  the  assessment  system  across  training  sites. 

Standards  for  defining  warfighter  competencies  as  well  as  standards  for  assessing 
warfighter  performance  against  those  competencies  must  first  be  established.  Once  warfighters 
have  defined  the  core  competency  skill  set  and  have  devised  metrics  to  measure  performance  on 
those  skills,  employing  those  competencies  and  metrics  as  standards  for  use  across  laboratory 
and  field  studies  enables  the  cross-comparison  of  results  for  a  given  warfighter  mission  area.  Of 
course,  this  requires  that  the  technology  mediums  are  in  place  to  pennit  the  implementation. 

This  report  discusses  the  importance  of  common  standards  in  permitting  for  cross-comparison  of 
results,  reports  a  study  demonstrating  the  proof-of-concept  using  only  the  common  standards  that 
would  be  necessary  for  study  implementation  at  a  number  of  sites,  and  advocates  for  technology 
enhancements  that  allow  for  expanding  some  of  these  standards  to  pennit  more  comprehensive 
studies  at  any  given  site  (Schreiber,  Watz,  &  Bennett,  2003;  Watz,  Schreiber,  Keck,  McCall,  & 
Bennett,  2003). 
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Common  Standard  #1:  Defining  the  Competencies. 

Determining  common  standards  for  the  competencies  which  should  be  assessed  is 
straightforward— the  skills  dictated  by  domain  experts  as  necessary  to  perform  their  mission. 

Note  that  the  training  emphasis  is  on  the  Warfighter  skill  level/competency,  not  on  the  frequency 
with  which  a  Warfighter  practices  an  event  (Chapman,  Colegrove,  &  Greschke,  in  press).  Once 
these  mission  essential  skills  have  been  identified  and  validated,  these  skills  become  the 
foundation  that  all  training  techniques  and  technologies  should  be  evaluated  against. 

Laboratories  and  field  sites  purportedly  training  a  given  mission  area  or  performing 
training  research  in  a  given  mission  area  should  use  these  standardized  competencies  as  the  basis 
for  evaluation.  The  results  from  a  process  that  reliably  produces  these  competency  skill  sets 
provide  the  over-arching  framework  needed  for  defining  a  competency  standard  and  for  defining 
standardized  metrics  to  assess  those  competencies. 

Fortunately,  a  standardized  process  to  define  competencies  already  exists.  Mission 
Essential  Competencies  (MECs)  are  “higher-order  individual,  team,  and  inter-team  competencies 
that  a  fully  prepared  pilot,  crew  or  flight  requires  for  successful  mission  completion  under 
adverse  conditions  and  in  a  non-permissive  environment”  (Colegrove  &  Alliger,  2002).  The 
MEC  process  uses  only  expert  operational  warfighter  inputs  as  data,  the  results  from  which  are 
both  valid  and  reliable  (Alliger,  Beard,  Bennett,  Symons,  &  Colegrove,  in  press;  Alliger,  Garrity, 
See,  McCall,  &  Tossell,  2004;  Alliger,  et  ah,  2003;  Alliger,  Colegrove,  &  Bennett,  2003; 
Colegrove  &  Alliger,  2002). 

An  example  MEC  air  superiority  skill  is  Controls  Intercept  Geometry  (CIG);  this  skill 
entails  managing  inter-aircraft  geometries  such  that  the  friendly  aircraft  minimizes  vulnerabilities 
to  the  threats  (while  simultaneously  being  able  to  employ  ordnance  against  the  threat).  In  an  air 
superiority  mission,  perfect  performance,  as  defined  by  subject  matter  experts  (SMEs),  would 
result  in  desirable  outcome  metrics  (e.g.,  no  friendly  mortalities,  all  threats  killed)  with  flawless 
skill  execution. 

Common  Standard  #2:  Metrics  to  Evaluate  Competencies. 

Obviously,  once  the  competency  skill  set  is  defined,  common  standard  metrics  are  needed 
to  assess  the  skill  competencies  defined  by  the  MECs  across  all  training  and  training  research 
installations.  These  metrics  should  exist  at  both  the  outcome  and  process  (skill)  level  and  be 
defined  by  SMEs  as  a  direct  subsequent  step  after  the  MEC  process. 

Consider  air  superiority:  In  a  point  defense  mission,  the  overriding  outcome  objective  is 
to  deny  enemy  bomber  aircraft  within  striking  distance  of  the  friendly  point  to  be  defended.  The 
next  most  important  outcome  is  to  maximize  the  kill  ratio — ideally  killing  all  threats  while  all 
friendly  aircraft  survive.  To  consistently  achieve  these  two  standard  high-level  outcome 
objectives,  warfighters  must  be  proficient  in  the  MEC  skills.  The  air  superiority  skill  CIG  serves 
as  an  example.  In  the  case  of  the  CIG  skill  defined  by  SMEs,  this  would  result  in,  ideally,  never 
allowing  a  hostile  fighter  aircraft  in  any  zone  depicted  in  Figure  1  while  that  hostile  is  pointing  at 
a  friendly. 
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Figure  1.  An  example  standardized  metric  used  to  assess  the  Controls  Intercept  Geometry  MEC  skill.  The 
friendly  aircraft  (in  blue)  ideally  does  not  want  the  threat  aircraft  to  penetrate  any  of  the  depicted  zones  with 
an  aspect  angle  over  120  degrees  (i.e.,  pointed  at  the  friendly). 

Measuring  the  skill  performance  in  addition  to  the  outcomes  will  best  reveal  how  well 
warfighters  are  performing  at  various  skill  competencies.  For  operational  training,  this  will 
allow  for  standardized  perfonnance  competency -based  assessment  across  installations. 
Furthermore,  the  training  research  community  is  then  better  prepared  not  only  to  cross-compare 
at  the  outcome  level,  but  also  to  identify  which  alternative  training  techniques  and  technologies 
are  best  for  targeting  which  skills.  With  the  ability  to  cross-compare  at  the  outcome  and  skill 
level,  the  training  community  can  both  determine  which  training  approaches  yield  the  best 
mission  outcomes  and  evaluate  the  specific  skill  improvement  rates  with  the  highest  retention 
and  transfer. 

But,  the  “how”  to  assess  the  competencies  can  only  be  done  if  the  technology  is  in  place 
to  support  implementing  those  measurement  standards  across  multiple  laboratory  and  field 
training  units.  And,  metrics  such  as  CIG  and  others  must  be  devised  so  that  they  can  be  captured 
in  an  automated  fashion  across  a  number  of  installations  without  developing  new  tools  or 
customizations  at  those  locations.  “Automated  performance  measurement  systems  have  been  a 
required  feature. .  .but  their  application  has  been  inconsistent  and,  in  many  cases,  inadequate. . .” 
(Kelly,  1988,  p.496). 

Due  to  network  protocol  standards  and  recent  performance  measurement  technology 
research,  this  inconsistent  era  may  be  ending,  bringing  us  to  the  third  and  last  major  area  required 
to  enable  routine  scientific  cross-comparison  of  results— the  technology  enablers  in  which  to 
employ  the  standard  metrics  for  assessing  the  MECs. 
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Common  Standard  #3:  Technology  Enablers. 

In  roughly  the  past  decade,  military  training  units  and  training  research  laboratories 
adopted  networked  simulators  as  a  primary  warfighter  training  method.  In  an  effort  to  connect 
simulators  allowing  engagement  in  a  virtual  environment,  engineers  developed  DIS,  or 
Distributed  Interactive  Simulation  (IEEE,  1995)  and  High  Level  Architecture  (HLA)  standards 
(e.g.,  Fischer,  Case,  &  Bertin  (Eds.),  2001).  DIS  or  an  agreed  upon  HLA  Real-Time  Interface 
(RTI)  and  Federated  Object  Model  (FOM)  requires  all  participating  entities  to  supply 
standardized  information  across  the  computer  network.  Since  these  DIS  and  HLA  network 
protocol  standards  are  employed  at  most  networked  operational  training  and  training  research 
locations,  a  potential  medium  exists  for  incorporating  standardized  competency  perfonnance 
measurement,  but  it  requires  an  assessment  system  to  capitalize  on  this  opportunity. 

Schreiber,  Watz,  Bennett,  &  Portrey  (2003)  and  Watz,  Keck,  &  Schreiber  (2004)  discuss 
a  Perfonnance  Effectiveness/Evaluation  Tracking  System  (PETS)  methodology  exploiting  the 
measurement  distribution  opportunity  afforded  by  DIS  and  HLA  so  that  assessment  data  at  any 
DIS/HLA  location  can  be  theoretically  captured.  Simply  stated,  the  PETS  assessment  system  is 
another  entity  on  the  network  adhering  to  the  same  DIS/HLA  network  protocol  standards,  not  for 
the  purpose  of  engaging  other  entities,  but  rather  for  reading  network  traffic  to  use  as  algorithm 
inputs  to  capture  and  record  the  metrics  needed  for  assessing  the  MEC  skills  of  various 
warfighters  participating  on  that  network.  To  provide  an  example  of  how  DIS/HLA  allows  for 
standardized  assessment  of  MECs,  consider  again  the  CIG  skill.  The  warfighter’s  goal  is  to 
minimize  the  CIG  time  while  achieving  mission  objectives.  The  CIG  assessment  rules  defined 
by  SMEs  are  (referring  to  Figure  1): 

1 .  Identify  hostile  fighter  aircraft  and  likely  weapon  load. 

2.  Determine  hostile’s  aspect  angle.  If  greater  than  120  degrees  (i.e.,  pointed  at  friendly), 
proceed  with  subsequent  rules. 

3.  Determine  hostile’s  quadrant  (front,  rear,  side). 

4.  Determine  hostile’s  altitude  and  range.  (The  altitude,  range,  and  threat  type  dictate  the 
critical  ranges  for  each  quadrant.) 

5.  Given  the  current  altitude,  range,  and  threat  type,  has  the  hostile  penetrated  within  a 
critical  range  to  friendly  for  that  quadrant  (Y/N)?  If  yes,  increment  time  on  CIG. 

Therefore,  to  assess  CIG  for  a  given  aircraft/warfighter,  the  following  inputs  are  needed 
from  any  DIS/HLA  network:  Aircraft  type,  force  affiliation  (Red/Blue),  position  (latitude, 
longitude,  and  altitude),  heading,  and  weapon  type.  The  PETS  system  “listens”  in  real-time  to 
the  network  traffic,  “looking”  for  those  inputs,  then  captures  relevant  inputs  to  identify  the 
friendly  and  enemy  fighters  along  with  their  altitudes  and  weapon  types.  The  system 
continuously  calculates  (every  50  msec)  all  aspect  angles  and  ranges  between  friendlies  and 
threats.  The  end  result  is  a  simple  detennination  by  the  PETS  system  whether  or  not  any  friendly 
has  allowed  a  hostile  to  violate  the  abovementioned  CIG  rules,  and  the  system  increments  a  timer 
for  each  friendly  that  does  so.  Outcome  metrics  and  additional  process/skill  metrics  are  captured 
in  the  same  manner  using  the  standardized  DIS/HLA  network  traffic,  which  is  rapidly  becoming 
ubiquitous  in  the  military  training  simulation  community.  MECs,  metrics,  and  the  technology 
enabling  distribution  system  are  the  three  instrumental  standardized  pieces  necessary  to  allow  for 
cross-comparison  of  laboratory  and  field  training  results. 
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Current  Work 

Since  common  standards  for  MECs,  metrics,  and  the  technology  enablers  of  DIS/HLA 
and  PETS  now  exist,  a  study  utilizing  only  these  standards  is  all  that  is  necessary  to  demonstrate 
a  proof-of-concept  for  permitting  routine  scientific  assessments.  The  results  of  the  study, 
especially  the  lessons  learned,  also  serve  to  highlight  where  the  common  standards  need  to 
mature  further.  The  search  was  short  for  finding  a  suitable  study  that  could  not  only  serve  this 
proof-of-concept,  but  also  contribute  to  the  scientific  body  of  research. 

Defying  good  business  practice,  DIS/HLA  networked  simulations  are  rapidly  becoming 
the  warfighter  training  medium  of  choice  without  the  backing  of  literature  supplying  objective, 
quantifiable  in-simulator  perfonnance  improvements— a  disturbing  trend  which  is  far  from  new 
(Waag,  1991).  Therefore,  a  fundamental  within- simulator  training  effectiveness  study 
documenting  the  learning  taking  place  within  a  distributed  simulation  environment  would 
perfectly  satisfy  current  study  requirements. 

METHODS 


Networked  Simulation  Facility 

The  Distributed  Mission  Operations  (DMO)  training  research  facility  at  the  Warfighter 
Readiness  Research  Division  in  Mesa,  AZ,  provided  the  distributed  simulation  environment  used 
for  the  present  study.  Four  high-fidelity  F-16  simulators  and  one  high-fidelity  Airborne  Warning 
and  Control  System  (AWACS)  were  used  in  conjunction  with  a  computer-generated  threat 
system  and  an  instructor  operator  station  (10  S).  Similar  to  many  distributed  simulation  training 
environments,  all  entities  interoperated  according  to  common  D1S  standards. 

The  high-fidelity  F-16  simulators  were  Block  30  with  a  360  degree  out-the- window 
visual  display.  The  F-16  display  systems  used  either  SGI®  Onyx2  Reality  Monster 
Visualization  supercomputers  or  pC-Novas  (v2.0)  running  Aechelon  runtime  software.  The 
visual  system  used  high  resolution  photo-realistic  databases  of  the  Sonoran  desert  overlaid  on 
terrain  elevation  data  of  the  region.  The  hardware  in  the  cockpits  was  identical  to  that  found  in 
the  actual  F-16,  as  was  the  software  (Software  Capabilities  Upgrade  [SCU]  version  4). 
Depending  on  the  type  of  mission  to  be  flown,  F-16  weapon  load-outs  for  missions  consisted  of 
differing  combinations  of  the  gun,  the  Air  Intercept  Missile  (AIM-9),  the  Advanced  Medium 
Range  Air-to-Air  Missile  (AMRAAM),  and/or  the  Mk-82  and  Mk-84  general  purpose  bombs.  A 
high-fidelity  AWACS  sensor  simulation  was  also  used  to  provide  a  more  realistic  environment. 
The  high-fidelity  AWACS  station  was  a  Solipsys  MSCT  V.  3.9  networked  to  the  Solipsys  TDF 
V.  2.7.3. 

The  computer-generated  threat  system  used  was  the  Automated  Threat  Engagement 
System  (ATES).  ATES  is  a  real-time  threat  generation  system  for  use  on  a  standard  DIS 
network.  The  ATES  system  uses  aerodynamic  modeling,  atmospheric  models,  radar  models, 
infrared  (IR)  models,  and  data  parameter  tables  for  thrust,  drag,  lift,  etc.  For  the  current  work, 
threat  air  models  were  the  MiG-29,  MiG-27/23,  and  Su-27  loaded  with  the  AA-8,  AA-lOa  and 
AA-lOc  air-to-air  missiles.  Ground  threats  included  the  SA-2,  SA-6,  and  SA-8,  and  antiaircraft 
artillery  (AAA).  Threat  aircraft  followed  maneuvers  and/or  scripted  flight  paths  and  reacted  to 
friendly  maneuvers  and  weapons. 
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Participants 

Operational  F-16  pilots  and  AWACS  controllers  routinely  visit  the  Warfighter  Readiness 
Research  Division  in  Mesa,  AZ  for  participation  in  various  training  research  studies.  For  the 
current  work,  35  operational  F-16  teams  (four  fighter  pilots  fly  as  part  of  a  four-ship  team)  who 
participated  in  five-day  training  research  between  January  2002  and  May  2003  were  used.  The 
mean  number  of  hours  flown  in  the  F-16  was  964  (range  448  to  2088). 

Training  Research  Syllabi 

During  the  data  collection  period,  pilots  flying  the  F-16  simulators  “flew”  one  of  four 
very  similar  syllabi— each  syllabus  consisted  of  nine  sessions,  beginning  with  session  one  on 
Monday  morning  and  ending  with  session  nine  on  Friday  morning.  There  were  two  sessions 
each  day  of  the  five-day  training  week,  except  for  Friday  when  the  participants  had  only  one 
session.  Each  session  entailed  a  one-hour  briefing,  an  hour  of  flying,  and  an  hour  and  a  half 
debriefing. 

The  syllabi  scenarios  could  be  either  offensive  or  defensive,  but  all  consisted  of  four  F- 
16s  versus  X  number  of  threats.  Scenarios  were  designed  with  trigger  events  and  situations  to 
specifically  train  MEC  skills.  These  syllabi  were  developed  with  traditional  methods  using  full 
mission  rehearsal  scenarios  across  a  spectrum  of  probable  air-to-air  missions  and  threats  while 
increasing  the  complexity  of  the  missions  as  the  training  research  week  progressed. 

Training  Research  Week 

Each  syllabus  began  with  a  familiarization  session  (session  one)  to  orient  pilots  to  DMO 
simulator  environment  specifics,  such  as  visual  identification  (ID)  characteristics  and  any 
switchology  differences  due  to  F-16  block  number  or  F-16  mission  software.  The  pilots  required 
very  little  familiarity  training,  since  the  high-fidelity  simulator  layout  closely  resembled  the 
actual  aircraft  and  since  all  the  declarative  and  procedural  knowledge  to  be  operationally 
qualified  to  fly  the  F-16  had  been  learned  by  participants  before  arriving.  Therefore,  after  the 
familiarity  session,  performance  increases  observed  throughout  the  course  of  the  subsequent 
sessions  were  the  result  of  learning  how  and  when  to  best  employ  the  skills  they  had  been  taught 
during  their  Air  Force  career. 

Session  two  (after  the  familiarization  period)  began  with  benchmarks  (i.e.,  a  “pre-test”) 
used  to  measure  pre-training  performance.  The  benchmarks  consisted  of  flying  three  point 
defense  engagements  (see  Figure  2).  All  benchmark  point  defense  scenarios  pitted  the  four 
participant  F-16s  against  eight  threats  (six  hostiles  and  two  strikers);  all  benchmarks  were 
designed  to  be  equally  complex  according  to  the  absolute  complexity  scoring  scheme  outlined  by 
Denning,  Bennett,  and  Crane  (2002). 

Five-point  defense  benchmark  scenarios  were  developed,  and  the  complexity  analysis 
revealed  that  all  benchmarks  were  indeed  equally  complex.  Unbeknown  to  the  pilots,  for  the 
Friday  benchmarks,  participants  (in  the  same  flight/cockpit  assignment)  flew  the  mirror-image  of 
the  three  benchmarks  that  were  flown  on  Monday. 
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Bench-1  A 


Bench-1  B 


Figure  2  Example  mirror-image  point  defense  benchmark  scenarios  used  for  the  pre-  and  post-test. 

The  participants’  overriding  goal  for  the  point  defense  benchmark  scenario  was  to 
prevent  the  enemy  striker/bombers  from  reaching  the  base  -  success  being  striker  denial  or  kill. 
The  benchmark  scenarios  were  selected  for  examination  in  the  present  study  as  pre-  and  post-test 
assessments  because: 

(1)  all  the  benchmark  engagements  have  equivalent  levels  of  complexity, 

(2)  three  benchmark  scenarios  occur  at  the  beginning  and  the  end  of  the  week-long  DMT 
syllabus, 

(3)  the  same  pilots  perform  the  benchmark  scenarios  in  the  same  team  positions  at  the 
beginning  and  the  end  of  the  week,  and 

(4)  the  benchmarks  were  flown  under  real-time  kill  removal  and  strict  data  collection 

rules. 


The  MEC-based  building-block  training  began  immediately  after  the  benchmarks  and 
continued  through  the  course  of  the  week.  Participating  teams  were  exposed  to  four  to  eight  full 
engagements  per  session,  with  each  engagement  generally  concluding  with  a  logical  end  such  as 
"Bingo"  (nearly  out  of  fuel),  all  threats  killed,  or  multiple  friendly  losses.  While  these  training 
sessions  emphasized  Defensive  Counter  Air  (DCA)  scenarios,  pilots  also  flew  Offensive  Counter 
Air  (OCA)  and  air-to-ground  missions.  All  engagements  were  flown  versus  simulation  of  actual 
threat  aircraft,  air-to-air  ordnance,  and  surface-to-air  ordnance.  These  30+  engagements  between 
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benchmarks  provided  a  very  rich  environment  for  air-to-air  training  and  were  the  equivalent  of 
flying  more  than  ten  friendly  four-ship  missions,  with  each  mission  opposed  by  8-16  dissimilar 
adversary  aircraft.  The  training  sessions  also  provided  real-time  enemy  kills  and  real-time 
friendly  losses.  The  building  block  training  sessions  progressed  in  complexity  by  increasing  the 
number  of  threat  aircraft,  the  type  of  threat  aircraft,  the  threat  aircraft  reactivity/maneuver,  and/or 
an  increase  in  the  vulnerability  time. 

Metrics 

A  primary  goal  of  the  current  work  was  to  demonstrate  a  proof-of-concept  study  relying 
on  only  common  standards  for  obtaining  the  data,  thereby  illustrating  that  the  groundwork  for 
leverage  and  cross-comparison  of  laboratory  and  field  studies  is  possible.  As  such,  the  metric  of 
greatest  interest  is  the  success  or  failure  of  the  MECs,  metrics,  and  technology  enablers  (DIS  and 
PETS)  for  conducting  a  distributed  simulation  study. 

For  the  secondary  goal  of  providing  baseline  within-simulator  effectiveness  data,  all 
metrics  were  therefore  captured  using  only  standards  methodologies  discussed.  DIS  and  PETS 
provided  the  standardized  technology  enablers  for  capturing  the  MEC-based  outcome  and  skill 
metrics. 

For  outcome  metrics,  enemy  strikers  reaching  target,  enemy  kills,  friendly  mortalities, 
and  percentage  of  threat  and  friendly  shots  resulting  in  a  kill  were  processed  and  recorded  using 
PETS  and  only  the  information  available  from  the  DIS  network. 

For  skill  metrics,  the  MEC  skill  CIG  and  one  indicator  of  the  MEC  weapons  employment 
skill— weapons  launch  range— were  recorded  in  the  same  manner  and  are  reported  in  the  current 
work.  To  report  substantially  more  process  metrics  for  more  skills,  a  more  comprehensive 
limited  distribution  technical  report  documenting  within-simulator  learning  is  currently  in 
preparation.  Only  high-level  descriptive  statistics,  in  terms  of  percentage  change,  are  reported 
here. 


RESULTS 

There  were  four  major  result  areas  of  interest.  Each  of  the  first  three  revolved  around  the 
success  or  failure  of  using  the  three  pivotal  common  standard  areas  previously  discussed,  MECs, 
metrics,  and  the  underlying  technology  enabler  system  (DIS/HLA  and  PETS).  Fulfilling  our 
secondary  objective,  the  fourth  result  area  was  to  report  initial  within-simulator  training  results 
that  could  serve  as  a  baseline  for  future  cross-comparisons  when  evaluating  alternative  training 
techniques  and  technologies.  Results  in  each  of  the  four  areas  are  discussed  in  turn. 

The  MEC  process  for  identifying  critical  skills  in  air  superiority  predated  the  study  here 
and,  save  some  delays  in  obtaining  operational  personnel  for  data  collection,  posed  no  issues. 

The  MEC  process  produced  37  skills  required  for  successful  air  superiority  in  actual  combat 
under  adverse  conditions,  thereby  providing  the  common  standard  defined  skill  set  for  warfighter 
air  superiority  metric  development. 
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Devising  standard  metrics  for  each  of  the  MEC  skills  required  a  diverse  set  of  solutions. 
Common  standards  for  MEC  air  superiority  outcome  metrics  (e.g.,  strikers  on  target,  kill  ratios) 
were  quickly  and  easily  identified.  Skill  metrics,  however,  were  much  more  varied,  some 
simple,  others  more  complicated.  Some  MEC  air  superiority  skills,  such  as  “weapons 
employment,”  were  rapidly  (and  unanimously)  identified  by  SMEs.  The  result  was  to  capture  a 
number  of  different  data  points  at  weapon  launch  and  weapon  detonate  (e.g.,  launch  range, 
launch  altitude,  launch  airspeed,  distance  between  entities  at  weapon  detonate,  etc.). 

Other  MEC  skills,  such  as  the  CIG  metric  already  described,  required  decomposing  SME 
heuristics  into  rule  sets  suitable  for  translation  into  programmable  code  for  the  PETS  system  to 
capture  off  of  a  standard  DIS/HLA  network.  A  number  of  the  MEC  air  superiority  skills  could 
not  be  converted  into  objective  metrics  for  capture  by  the  PETS  assessment  technology  (e.g., 
communication).  For  these  MEC  air  superiority  skills,  separate  subjective  assessment  tools  were 
used. 


In  the  end,  all  MEC  air  superiority  outcome  metrics  were  successfully  captured,  some 
skill  metrics  were  successfully  captured  objectively  and  off-line  subjectively,  and  the  remaining 
skill  metrics  are  still  in  development  with  a  new  subjective  assessment  system  that  ultimately  is 
designed  for  incorporation  into  the  PETS  technology  (MacMillan,  Entin,  &  Morley,  in  press). 

The  DIS  and  PETS  enabling  technologies  used  to  capture  the  objective  metrics,  worked 
generally  as  expected.  The  DIS  environment  met  objectives  and  most  expectations,  allowing  all 
entities  to  interoperate  routinely  and  successfully  on  over  1,000  simulated  engagements  with 
only  a  few  notable  DIS  issues.  As  such,  data  was  successfully  captured  according  to  research 
protocol  for  3 1  teams.  “According  to  research  protocol”  was  a  logistical  and  control  necessity 
added  due  to  a  limitation  in  the  DIS  network  protocol  standard.  No  standards  within  the  DIS 
community  exist  to  regulate  the  human  operators  of  the  DIS  distributed  simulation  environments. 
That  is,  under  DIS  and  HLA  common  protocol  standards,  console  operators  are  free  to  act  in 
“God-like”  manners  that  would  largely  invalidate  any  conclusions  drawn  from  metrics  obtained. 
Examples  include  using  “shields,”  regenerating  killed  entities,  reloading  fuel  without  an  aircraft 
visiting  a  tanker,  etc.  These  common  standards  limitations  were  addressed  early  in  this  study  by 
writing  specific  “research  protocols”  to  be  adhered  by  all  operators.  Additionally,  early  in  the 
current  research  it  was  discovered  that  DIS  protocol  standards  only  mandate  a  limited  and 
narrowly  focused  set  of  data  to  be  shared  among  entities  on  the  network  (i.e.,  mainly  positional 
and  attributional  data),  thereby  limiting  the  potential  for  the  PETS  system  to  collect  the 
necessary  data  for  some  objective  metrics.  Furthermore,  because  DIS  operates  on  a  broadcasting 
protocol  (i.e.,  no  recipient  confirmation  required),  some  standard  data  packets  could  go  missing, 
resulting  in  that  data  never  being  processed  by  the  PETS  technology. 

Finally,  though  tangentially  germane  to  the  current  work,  it  is  of  interest  to  note  that 
during  other,  non-related  large-scale  research  exercises,  additional  DIS  issues  would 
occasionally  surface,  such  as  bandwidth  and  DIS  version  control. 

In  summary,  most  engineering  issues  for  this  study  (single  site,  less  than  twenty  entities) 
occurred  not  so  much  with  DIS,  but  with  a  single  simulator  or  the  threat  generation  system  (i.e., 
problem  resided  within  the  simulator  system  itself,  not  the  DIS  network  protocol). 
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Contradicting  pre-study  suppositions,  the  PETS  enabling  technology  used  to  collect  the 
metrics  worked  quite  well  for  skill  metrics,  but  outcome  metrics  were  much  more  complicated  to 
capture.  Much  of  this  difficulty  could  be  attributed  to  occasional  inconsistencies  in  the  DIS 
network  (e.g.,  missing  network  data  traffic  described  above.  Other  difficulties  required  updating 
the  PETS  technology  to  capture  unique  events  impacting  outcome  metrics  (e.g.,  correctly 
registering  a  kill  when  one  aircraft  chases  another  into  a  mountain  without  shooting  it).  These 
challenges  were  overcome  with  updated  code  and  all  objective  outcome  and  skill  metrics 
reported  here  were  collected  automatically  and  successfully. 

Baseline  within- simulator  training  effectiveness  study  results  revealed  that  all  metrics  for 
the  3 1  teams  showed  improvements  in  the  expected  direction.  Compared  to  the  Monday 
benchmarks  (session  two),  performance  observed  on  the  Friday  benchmarks  (session  nine) 
showed  69%  fewer  F-16  mortalities,  61%  fewer  enemy  bombers  reaching  base,  25%  more 
threats  killed,  10%  longer  range  at  launch  of  missile,  69%  improved  performance  (less  time)  on 
the  MEC  CIG  skill  metric,  55%  fewer  threat  shots  resulting  in  a  kill,  and  7%  more  F-16  shots 
resulting  in  a  kill  (Gehr,  Schreiber,  &  Bennett,  2004). 

DISCUSSION 

The  successful  distributed  simulation  study  reported  here  represents  a  training 
transfonnational  capability  to  automatically  capture  objective  human  performance  data  from  a 
DMO  environment.  Relying  only  upon  the  MEC,  metric,  and  enabling  technology  standards, 
this  study  illustrates  the  potential  capabilities  for  the  training  community  in  the  future.  Though 
the  current  work  reported  only  one  study  at  one  location,  the  study  was  conducted  by  relying  on 
standards  that  should  theoretically  be  easy  to  apply  at  another  DMO  location.  Indeed,  based 
upon  the  promising  feasibilities,  efforts  are  currently  underway  to  enable  and  test  these 
assessment  capabilities  at  a  sample  of  other  DMO  sites  (e.g.,  Shaw  Air  Force  Base;  Bills  & 
Devol,  2003).  A  demonstration  for  collecting  the  same  standardized  metric  data  is  also  planned 
for  a  live  fly  event  at  Nellis  Air  Force  Base  by  the  end  of  2005. 

With  a  capability  to  standardize  assessing  skill  competencies  across  field  site  and 
laboratory  installations,  the  operational  community  would  be  able  to,  at  any  time  and  at  any 
place,  theoretically  assess  a  warfighter  on  his/her  skill  and  carry  those  results  forward 
longitudinally  and  across  installations.  Furthermore,  the  scientific  community  would  be  afforded 
the  ability  to  cross-compare  study  results  evaluating  alternative  training  techniques  or 
technologies  and  do  so  quantitatively,  thereby  revealing  the  best  value  added  training 
approaches. 

The  processes  for  MEC  development  and  metric  development,  though  SME-intensive, 
did  not  pose  any  significant  issues  to  conducting  this  study.  This  makes  intuitive  sense,  as  both 
those  common  standards  are  processes  which  result  in  information  standards  to  be  used.  Of 
course,  a  proof-of-concept  study  such  as  this,  while  successful,  was  not  without  complications. 
The  success  or  failure  for  researchers  to  take  those  information  standards  and  convert  them  into 
application  and  concrete,  valid  assessments  for  this  (or  any  future  study)  hinges  upon  the 
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technology  enablers.  Therefore,  it  was  expected  that  the  majority  of  issues  encountered  would 
exist  with  the  DIS  and  PETS  enabling  technologies. 

The  DIS/HLA  standards  are  rather  limited  (Lacy  &  Tuttle,  1994)  and  should  expand.  In 
DIS  and  commonly  used  HLA  FOMs,  the  typical  data  packet  passed  between  interoperating 
entities  on  the  network  contains  roughly  13  variables.  The  entity  state  packet,  for  example, 
contains  primarily  attribution  and  positional  information  (e.g.,  Su-27,  latitude,  longitude, 
altitude)  updated  at  a  given  frequency.  The  variables  within  these  data  packets  are  the  only 
sources  of  information  for  which  standardized  assessment  methodologies  such  as  PETS  can  use 
as  network  inputs  for  calculating  perfonnance  metrics.  Additional  inputs  can,  however,  be  taken 
from  non-network  sources  such  as  configuration  tables,  as  the  CIG  assessment  algorithm  does 
for  its  required  quadrant  ranges. 

Since  network  traffic  is  minimal,  the  pool  of  input  variables  for  assessments  is  limited. 
Consider  the  CIG  metric:  All  the  data  required  for  computing  that  metric  is  available  per 
common  DIS/HLA  standards  or  simple  configuration  tables,  except  for  the  threat’s  weapons 
load.  Therefore,  a  custom  modification  was  performed  within  DIS  standards  to  allow  for 
capturing  and  assessing  the  CIG  metric  accurately— obviously  not  the  desired  long-tenn  solution. 
The  other  undesirable  option  would  have  been  to  operate  under  an  assumed  weapons  load  given 
the  type  of  threat — the  threat  type  being  known  from  network  traffic.  But  this  approach 
introduces  errors  when  those  assumptions  are  not  true.  The  current  limited  network  data  traffic 
exists  primarily  out  of  meeting  only  basic  interoperating  needs  and  bandwidth  limitations.  Given 
standardized  assessment  requirements  and  time  to  allow  technology  enhancements  to  increase 
bandwidth  for  DIS/HLA  environments,  more  MEC  skills  could  be  assessed  using  standardized 
metrics  and  standardized  technology  enablers  such  as  PETS.  If  standards  do  not  expand,  only 
outcome  metrics  and  a  limited  set  of  MEC  skill  metrics  can  be  automatically  and  objectively 
obtained  via  any  standard  DIS/HLA  DMO  network. 

Perhaps  less  obvious  for  cross-comparing  and  leveraging  results  between  organizations, 
standards  for  administering  distributed  simulation  events  should  exist.  Conveyed  more  clearly 
by  way  of  example,  consider  the  “regeneration”  capability.  Using  regeneration  from  the  IOS 
during  an  unfolding  scenario  impedes  attempts  to  automatically  collect  outcome  metrics  (e.g., 
kill  ratios).  Even  with  extensive  code  to  accurately  collect  this  information  in  spite  of  IOS 
operator  “God-like”  actions,  the  data  are  rendered  almost  useless  for  interpreting  and  drawing 
conclusions  about  the  training.  Other  IOS  functionalities  carry  similar  assessment  pitfalls,  such 
as  shields,  freezing,  or  relocating  entities.  These  approaches  may  very  well  be  desirable  as  part 
of  a  training  technique  or  strategy,  but  for  measurement  points  (i.e.,  benchmarks)  to  formally 
assess  performance,  the  “realism”  approach  is  best — using  kill  removals,  no  mid-air  weapons 
reloading,  no  refueling  in  flight  unless  done  so  via  tanker,  etc.  In  addition  to  standardizing 
measurement  points,  this  approach  also  provides  stronger  conclusions  to  be  drawn  about  the 
value  of  the  training  and  allows  for  more  direct  comparisons  to  range  exercises. 

The  demonstrated  performance  improvement  results  suggest  that  significant  learning  took 
place  in  the  DMO  environment.  These  results  provide  strong  evidence  for  reaffirming  some 
DMO  training  effectiveness  subjective  data  studies  (Bennett,  Schreiber,  &  Andrews,  2002; 

Crane,  Robbins,  &  Bennett,  2000;  Krusmark,  Schreiber,  &  Bennett,  2004;  Waag,  Houck, 
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Greschke,  &  Raspotnik  1995),  but  the  conclusions  here  are  taken  further  by  quantifying  the 
magnitude  of  in-simulator  learning  improvement.  These  objective  results  show  the  F-16  teams 
were  not  simply  sacrificing  perfonnance  in  one  area  to  improve  performance  in  another  area,  but 
rather  that  they  were  improving  in  both  offensive  and  defensive  skills.  By  the  end  of  the  training 
week,  F-16  teams  perfonned  the  MEC  CIG  skill  more  effectively,  they  increased  weapons 
employment  effectiveness,  and  their  kill  ratios  increased— all  while  launching  weapons  at  longer 
ranges  and  pennitting  fewer  enemy  strikers  to  reach  their  target. 

In  addition  to  learning  other  critical  skills,  it  is  postulated  that  the  F-16  pilots  learned 
where  and  when  to  best  position  their  weapons  systems  in  specific  inter-aircraft  geometries  such 
that  they  could  effectively  employ  their  radar  missiles,  but  simultaneously  avoid  vulnerable 
exposure  to  the  threats’  weapons  engagement  zones.  The  current  study  can  be  used  as  a  baseline 
DMO  training  effectiveness  study  which  other  laboratory  studies  or  operational  DMO  sites  can 
then  compare  against  when  evaluating  alternative  training  approaches. 

Direct  comparisons  to  range  exercises  provide  the  final,  long-term  objective — using  the 
same  standards  for  assessing  performance  and  cross-comparing  results  from  training  research 
laboratories,  operational  training  locations,  and  range  exercises.  For  example,  at  the  Nellis 
Range,  much  of  the  data  for  the  aircraft  participating  in  live-fly  exercises  is  passed  in  a  similar 
manner  to  current  DIS/HLA  network  protocol  standards.  If  the  standards  already  discussed  can 
be  employed  not  only  at  the  DMO  simulation  facilities,  but  also  at  the  live  exercise  ranges,  the 
scientific  potential  for  discovering  the  best  uses  of  DMO  cannot  be  overemphasized.  Objective, 
in-simulator  learning  assessments  could  become  routine  and  thus  any  systematic  change  within 
or  between  similar  DMO  environments  could  then  be  objectively  assessed.  Furthermore, 
straightforward  transfer  of  training  assessments  from  the  DMO  environment  to  the  range 
becomes  possible. 

It  appears  that  this  cross-comparison  era  is  dawning.  As  mentioned,  DIS  and  HLA  are 
already  commonly  accepted  network  protocol  standards.  The  United  States  Air  Force’s  Air 
Combat  Command  (ACC)  has  called  for  MECs  to  be  developed  for  all  major  Air  Force  weapons 
systems  (over  15  of  which  are  either  in  process  or  completed),  and  the  metrics  and  PETS 
assessment  methodology  have  been  identified  as  ACC’s  potential  solution  for  an  Air  Force- wide 
MEC  competency-based  assessment  system.  The  backing  of  these  communities  solidifies  these 
necessary  core  common  standard  areas  as  standards  likely  to  be  implemented  across  a  great 
number  of  military  training  and  training  research  institutions,  creating  a  transformation  for 
training  research,  warfighter  competency-based  training,  and  evaluating  alternative  training 
techniques  and  technologies. 
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ACRONYMS 


Antiaircraft  Artillery 

AAA 

Air  Combat  Command 

ACC 

Air  Intercept  Missile 

AIM-9 

Advanced  Medium  Range  Air-to-Air  Missile 

AMRAAM 

Automated  Threat  Engagement  System 

ATES 

Airborne  Warning  and  Control  System 

AWACS 

Controls  Intercept  Geometry 

CIG 

Defensive  Counter  Air  scenarios 

DCA 

Distributed  Interactive  Simulation 

DIS 

Distributed  Mission  Operations 

DMO 

Federated  Object  Model 

FOM 

High  Level  Architecture 

HLA 

Identification 

ID 

Infrared 

IR 

Institute  of  Electrical  and  Electronics  Engineers 

IEEE 

Instructor  Operator  Station 

IOS 

Mission  Essential  Competencies 

MECs 

Offensive  Counter  Air 

OCA 

Performance  Effectiveness/Evaluation  Tracking  System 

PETS 

Real-Time  Interface 

RTI 

Software  Capabilities  Upgrade 

SCU 
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